Telugu Large Language Model Project

School of Computing
University of Silicon Andhra

Karthik Uppuluri and Venkat Gudivada

Corpus Database

  • Maintains curated Telugu documents of different genres

  • PostgreSQL Server

  • Foundation for everything else

Web Application

  • Provides a workflow system for curating Telugu documents

  • Role-based access control

Technical Volunteers for Developing the Web Application

  • React.js

  • Django (secure data and content management) vs Node.js (real-time data processing and scalability under heavy traffic)

  • Software Testing

  • PostgreSQL Administration

  • Linux Server Administration

  • Doccano Server Administration

Telugu Linguists

  • Annotation using the Doccano Server

  • Coarse-grained vs fine-grained POS tags

  • Paraphrasing problem: binary classification (yes/no)

  • Textual entailment: binary (yes/no) vs. ternary (yes/no/neutral) classification

  • Annotated data for question answering, text summarization, and text classification

Demos

Additional Information

Questions